
import Head from 'next/head'

<Head>
  <script>
    {
      `(function() {
         var _hmt = _hmt || [];
(function() {
  var hm = document.createElement("script");
  hm.src = "https://hm.baidu.com/hm.js?e60fb290e204e04c5cb6f79b0ac1e697";
  var s = document.getElementsByTagName("script")[0]; 
  s.parentNode.insertBefore(hm, s);
})();
       })();`
    }
  </script>
</Head>

![LangChain](https://pica.zhimg.com/50/v2-56e8bbb52aa271012541c1fe1ceb11a2_r.gif)





Runhouse
==================

[Runhouse](https://github.com/run-house/runhouse) 允许在环境和用户之间进行远程计算和数据处理。请参阅 [Runhouse docs](https://runhouse-docs.readthedocs-hosted.com/en/latest/)。

此示例介绍了如何使用LangChain和 [Runhouse](https://github.com/run-house/runhouse)，与托管在您自己的GPU上，或在AWS，GCP，AWS或Lambda上提供的按需GPU交互的模型。 

**注意**：此代码中使用 `SelfHosted` 而非 `Runhouse` 作为名称。

```python
!pip install runhouse

```

```python
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh

```

```python
INFO | 2023-04-17 16:47:36,173 | No auth token provided, so not using RNS API to save and load configs

```

```python
# For an on-demand A100 with GCP, Azure, or Lambda
gpu = rh.cluster(name="rh-a10x", instance_type="A100:1", use_spot=False)

# For an on-demand A10G with AWS (no single A100s on AWS)
# gpu = rh.cluster(name='rh-a10x', instance_type='g5.2xlarge', provider='aws')

# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'], 
# ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
# name='rh-a10x')

```

```python
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

```

```python
llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["pip:./", "transformers", "torch"])

```

```python
llm_chain = LLMChain(prompt=prompt, llm=llm)

```

```python
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"

llm_chain.run(question)

```

```python
INFO | 2023-02-17 05:42:23,537 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:24,016 | Time to send message: 0.48 seconds

```

```python
"  Let's say we're talking sports teams who won the Super Bowl in the year Justin Beiber"

```

您还可以通过SelfHostedHuggingFaceLLM接口加载更多自定义模型：

```python
llm = SelfHostedHuggingFaceLLM(
    model_id="google/flan-t5-small",
    task="text2text-generation",
    hardware=gpu,
)

```

```python
llm("What is the capital of Germany?")

```

```python
INFO | 2023-02-17 05:54:21,681 | Running _generate_text via gRPC
INFO | 2023-02-17 05:54:21,937 | Time to send message: 0.25 seconds

```

```python
'berlin'

```

使用自定义加载函数，我们可以直接在远程硬件上加载自定义流水线：

```python
def load_pipeline():
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline  # Need to be inside the fn in notebooks
    model_id = "gpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    pipe = pipeline(
        "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=10
    )
    return pipe

def inference_fn(pipeline, prompt, stop = None):
    return pipeline(prompt)[0]["generated_text"][len(prompt):]

```

```python
llm = SelfHostedHuggingFaceLLM(model_load_fn=load_pipeline, hardware=gpu, inference_fn=inference_fn)

```

```python
llm("Who is the current US president?")

```

```python
INFO | 2023-02-17 05:42:59,219 | Running _generate_text via gRPC
INFO | 2023-02-17 05:42:59,522 | Time to send message: 0.3 seconds

```

```python
'john w. bush'

```

您可以直接通过网络将您的流水线发送给您的模型，但这仅适用于小模型`（'<2 Gb'）`，并且速度较慢：
```python
pipeline = load_pipeline()
llm = SelfHostedPipeline.from_pipeline(
    pipeline=pipeline, hardware=gpu, model_reqs=model_reqs
)

```

相反，我们还可以将其发送到硬件的文件系统，这将更快。

```python
rh.blob(pickle.dumps(pipeline), path="models/pipeline.pkl").save().to(gpu, path="models")

llm = SelfHostedPipeline.from_pipeline(pipeline="models/pipeline.pkl", hardware=gpu)

```

